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Abstract 

We introduce and study a class of exchangeable random graph ensembles. They can 
be used as statistical null models for empirical networks, and as a tool for theoretical 
investigations. We provide general theorems that characterize the degree distribution of 
the ensemble graphs, together with some features that are important for applications, 
such as subgraph distributions and kernel of the adjacency matrix. A particular case of 
directed networks with power-law out-degree is studied in more detail, as an example of 
the flexibility of the model in applications. 
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1 Introduction 

Random graphs have attracted much interest as null- and positive models for many real- world systems 
involving many interacting agents, such as the internet, epidemics, social and biological interactions 
(see for instance [551 1451 1471 143)). In many of these instances, one is naturally confronted with 
properties that differ from the classical Erdos-Renyi model. We recall that, in the Erdos-Renyi model, 
edges in the graph exist independently from each other, with a fixed probability (dependent on the 
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dimension of the graph). While for the Erdos-Renyi model analytical expressions for many of the 
relevant observable properties of the graph (such as the diameter, clustering coefhcient, component 
size distributions, subgraph distribution, giant component, etc) are available, less is known for other 
kinds of models. In the recent years, in connection with the availability of large-scale data on real-life 
networks, many studies addressing random graph models going beyond the Erdos-Renyi model have 
appeared. Two studies that are worth mentioning are the so-called "small-world" model [56] and the 
preferential-attachment model [5] , addressing the empirically observable phenomena of short shortest- 
paths and power-law degree distributions respectively. This new wave of models has affected also the 
mathematical literature (see, for instance refs [Tl[l|3[T2l|8l[M[IIl[Il[l5l[lZlll5l|4g). Among the 
many recent mathematical books on the subject, we would like to mention, for classical random graph 
theory, [33] and [7], and, for more recent models of random graphs, [16] and [19] • From a statistical 
point of view, which we adopt here, it is natural to seek a parameterizable stochastic model of complex 
graphs, that would be at the same time flexible for practical use and mathematically tractable for 
theoretical exploration. Moreover, it is desirable that the qualitative properties of the model should 
emerge from some simple unifying mathematical structure rather than from ad-hoc considerations, 
see [2l|7l[12||l[T0l[15]|lZ]. 

The aim of this paper is to present a general class of random graphs that addresses these 
needs. It was introduced in [6] in a particular case, connected to the study of null models for tran- 
scriptional regulation networks [1]. The defining property of the graph ensemble is the exchangeable 
structure of its degree correlations. This symmetry property makes it particularly apt to be used as a 
statistical null model. The most important advantages of such an approach are the following: (i) Much 
as in the Erdos-Renyi model, some observables can be easily computed analytically for finite sizes 
and asymptotically, rather than estimated numerically, (ii) It is fast and versatile in computational 
implementations and statistical applications. As we will show in the different sections of this paper, 
many observables that are commonly useful in the analysis of large-scale networks are particularly 
simple to access with our ensemble. In order to show the range of applicability, we discuss multiple 
applications to observables in the model graphs rather than presenting a very detailed analysis on 
a single graph feature. In the use as a null model, differently from other approaches used in the 
study of transcriptional and other networks [311 1501 113) , our generating method for random graphs is 
not designed to conserve the degree sequence of the observed real graph, but rather as a method to 
generate graphs with degree distributions having certain prescribed properties. 

The paper is structured as follows. Section [2] introduces a rather general class of random 
directed network ensembles that can be produced with the same defining principle of exchangeability, 
and discusses some simple variants. The following part is intended to show how the structure of the 
proposed model is useful in the the study of many relevant topological features of the ensemble. To 
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this aim, in Section |3] we prove some theorems which characterize the degree distributions and the 
distribution of the size of the "hub" (or the maximally connected node). In particular, we show that 
the model can generate an ensemble characterized by a Poisson limit distribution for the in-degree, 
and a mixture of Poisson limit distributions for the out-degree. This important property enables 
to obtain a limit out-degree distribution with power-law tails. In the same section, we show that 
the probability that the graph is disconnected goes to 1 as the size of the graph diverges. Section |4] 
gives some results concerning the mean number of subgraphs (a quantity of some importance in many 
applications), roots and leaves. Section[5]considers a particular Boolean optimization problem defined 
on the graph, which emerges in statistical physics and theoretical computer science. More precisely, 
we will give some results concerning the non-trivial problem of the dimension of the kernel of the 
adjacency matrix. In Section [6] we briefly comment the two variants of the main model. Finally, 
Section [7] contains the detailed analysis of a simple two-parameter ensemble derived from the general 
model presented in Section [S] Some of the proofs are deferred to the Appendix. 

2 The Model 

Although the ideas we describe are applicable to both directed and undirected graphs, we will mainly 
consider here the case of directed graphs. Any directed random graph Gn with n nodes is completely 

specified by its adjacency matrix X„ — X(Gn) ~ n, where ^j-"' = 1 if there is a directed 

edge i —> j, otherwise. In many applications, such as transcription networks instead of square 
matrices, one may also consider rectangular matrices. The reason for this is that in some situations it 
is reasonable to assume that, while all nodes can receive edges, only a fraction of nodes can send them 
out (see [B] for an introduction to this problem). Hence, in what follows we will deal with rectangular 
matrices m„ x n. As we will see in Section [T] this is a necessary choice for networks with power- law 
degree distributions having exponent equal or lower than 2 (thus with diverging average) to obtain 
non-trivial asymptotics. 

One of the interests of our procedure is the fact that it can produce graphs with different in- 
and out- degrees distributions. Naturally, if the graph is generated by throwing independently each 
directed edge with a fixed probability - as in the case of (undirected) Erdos-Renyi graphs- this is 
not possible. In order to build a random graph with different in- and out -degree distributions, one 
must give up total independence and allow some kind of dependence among edges. In particular, 
maintaining the maximal symmetry leads to the choice of exchangeability. 
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2.1 Partially Exchangeable Random Graphs 

The first general class we will consider includes directed graphs whose in- or out-degrees, i.e. the 
columns or the rows of X„, are exchangeable, while the out- or in-degrees are stochastically indepen- 
dent. Differently put, our model ensemble can be defined using the following generative algorithm. 
For each row of X„ , independently, (i) throw a bias 9 from a prescribed probability distribution 7r„ 
on [0, 1] (ii) and set the row elements of X„ to be or 1 according to the toss of a coin with bias 6. 
Since each row is thrown independently, the resulting probability law is 

= eij, 1 = 1,.. . ,m„,j = 1, . . . ,n} 

™" r (1) 

where aj £ {0, 1}, i, j — 1, . . . n. In other words, each row of X(G'n) is independent from the others 
with exchangeable law directed by 7r„. One can apply an identical procedure to the transposed matrix 
of X„ and switch the role of in- and out-degrees. 

It is worth recalling that a random vector, say (Vi, . . . , is said to be exchangebale if its 
law is invariant under any permutation, that is, if for any permutation a of {1, . . . , n}, {Yi, . . . , Y„) 
and (ya-(i)) • • ■ , Ya-(n)) have the same law. For an introduction to exchangeable sequences and array 
see, e.g., |3]- This hypothesis is important for the use of the ensemble to produce statistical null 
models, as it implies symmetry of the probability distributions with respect to the permutation of 
variables, i.e. all the nodes or the agents they represent (genes, computer routers, etc .) are given an 
equivalent status. 

To complete the model, one has to specify the choice for 7r„, which determines the behavior 
of the graph ensemble. For example, in [B], we have chosen the two-parameter distribution 

■K4de)^z-^9~%s^^^{9)de (2) 

where n > a > and /3 > 1 are free parameters, I(° ,i] is the indicator function of the interval (f^, 1], 
taking the value one inside the interval and zero everywhere else, and Zn {{n/a)^~^ — l)/(/3 — 1) 
is the normalization constant. As we will see in Section [T] this choice produces a graph ensemble 
with heavy-tailed degree sequences. As a second example, taking ■K„{d9) — 5\/„{d9), one obtains a 
directed version of the Erdos-Renyi graph. 

A naturally interesting problem is to characterize the general forms of the probability measure 
7r„ that lead to graph ensembles with qualitatively different characteristics. In Section[3]we shall give 
some results in this direction. Note that a general way of producing the distribution 7r„ for each n, 
starting form a given "seed" F [F being a fixed distribution function on R"''), is easily described by 
the following assumption: 

Fn{x) := F{xn)/F{n) = / 7r„(d0). (3) 
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With the above assumption, Fn is a well-defined distribution function on [0, 1] whenever F{n) > 0, 
which certainly holds for large enough values of n. 

2.2 Completely exchangeable graphs 

The above described method of generating exchangeable graphs is quite general, so that one can 
imagine many simple variants. For example, one can consider the following algorithm: (i) throw a 
bias 9 from a prescribed probability distribution 7r„ and (ii) set all the elements of X„ to be or 1 
according to the toss of a coin with bias 0. The resulting probability law, say Q, is 

= e.,r, i,j = i,...,n}= f es.. (1 - ^^.^n„ide) 

J [0,1] 

for any Cij in {0, 1} i,j = 1, . . . n, that is under Q, {X^"^ -ji,] = 1, . . . , n} are exchangeable, with de 
Finetti measure tt^. 

2.3 Hierarchical models 

Another possible variant considers a hierarchy of probability distributions to generate the bias of the 
coins. In this case one can take 

Q*{x'Cj =ei,i, i = l,...,m„,j = = 

(4) 

4+ Y\Z\ /[o,i] (1 - eO"-^?-^ ^-7r„(de,|a)A„(da) 

A„ being a probability on and TXn{d9\a) being a kernel on [0, 1] x R"*", that is: for every a in 
R^, 7r,i(-|Q) is a measure on the Borel cr-field of [0, 1] and, for every measurable subset B of [0, 1], 
Q 7r,i(_B|Q) is measurable. 

3 Connectivities 

We will carry the main discussion considering the case of partially exchangeable graphs of Subsec- 
tion 12.21 Some brief comments on the other variants are reported in Section |6l In the rest of the 
paper, with the exception of Section |S] we suppose that all the random elements are defined on the 
same probability space (57, P) and we denote by E(y) the mathematical expectation of a given 
random variable Y with respect to P. With a slight abuse of notation we shall use indifferently Gn, 
the random graph, and its adjacency matrix X„ = [-^'"'jij- 
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3.1 In and out connectivity 

The first quantities tliat we want to cliaracterize are the graph degree distributions. The random vari- 
able ^m„,j j' represents the in-degree of the j-th node in the random graph, while Sn,i ■= 
'^"^i '-^ri be seen as the out-degree of the i-th node (1 < i < m„). Note that (^m„,i, • • • , ^m„,n) 
are identically distributed as well as (Sn.i, • . . , Sn.mn)- Moreover, (Sn,i, . • • , Sn.ran) a-re independent, 
and each Sn,i is a sum of exchangeable Boolean random variables, while (Zm„,ii • • • i ^m„,n) are de- 
pendent. Clearly, the mean degrees are equal to m„/i„ and n/i„, respectively, where /i„ := -P{X|"^ — 
1} = Jjjj 9TTn{d6) is the probability of the link i j. Note that, while in the Erdos-Renyi model 
n/in = A for every n, in this case nfi„ generally depends on n. On the other hand, when (|3} is in 
force, using the well-known fact that E(Y) = Jf^°° (1 - G{y))dy for any positive random variable Y 
with distribution function G, one gets 



nfi,,^ [ n6»7r„(d6l)= / {1 - F{x)/F{n))dx, 

J[OA] Jo 



and hence, if fi := J^°° xdF{x) < +oo, it follows that n/i„ — fi + o{l). The (marginal) degree 
distributions are given by 

= fc} = ( " ) / e^ii - er-\„{de) (5) 



[0,1) 



and 



=fc}= ("^"|Mn(l-Mn)'""-^ (6) 



With the above expressions, the problem of determining the asymptotic distribution of {Zm„,i),i>i and 
(S„,i)„>i is simply cast in a central limit problem for triangular arrays. In fact, while for {Zm„.i)n>i 
a classical central limit theorem (CLT) for triangular arrays of independent random variables works, 
for (S'n,i)n>i one needs a CLT for exchangeable random variables. General CLTs for exchangeable 
random variables are well known (see, for instance, |25II51) '). Here the situation is particularly simple, 
since we are dealing with 0—1 random variables. Consequently, we need only a simple ad-hoc CLT, 
for exchangeable Boolean random variables. 

Let 6„ be a random variable taking values in [0, 1] with distribution 7r„ and set Tn := n6„. 
The next proposition shows that, under a set of reasonable assumptions on Tn, the limit law of 
(S'n,i)n>i is a mixture of Poisson distributions, while the limit law of {ZnA)n>i is a simple Poisson 
distribution. 

Proposition 3.1 (CLT) If {Tn)n>\ converges in distribution to a random variable T with distribu- 
tion function F , then, for every integer j > 1, 



lim P{S„,j = fc} = E 

n — * + oo 









Jo 



r + oo ik 

ye-'dF{t) (fc = 0,l,...). 



(7) 
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Moreover, if for some A > and for a sequence (a„)„>i 



lim a„E(r„) = lim na„ / 6'7r„(d6l) = A (8) 
/loWs true, then, for every integers k > and j, 

lim P{Zm„.j=k} = 

n-+4-oo /e! 

UliJ/i rrin = [na„] {[x] being the integer part of x). 

Remark 1 (a) If |0) holds true, then the distribution of T is F . Indeed, in this case, 

lim P{T^<x}= lim P{^„ < a;/n} = lim F{x) / F{n) ^ F{x) (a; > 0). 



n — • + CXD n — * + cxD n — ^ + oo 



(b) It is worth noticing that as a corollary of Theorem 5 in f25f one has that the convergence ofTn is 
a necessary and sufficient condition in order to obtain a Poisson mixture as a limit law for (S„,j)„>i. 
Hence, the first part of the previous proposition can be proved invoking such a theorem. Nevertheless, 
for the sake of completeness, we shall give here a simple direct proof. 



Proof of Provosition [3n\ Since T„ ~ nOn, by ((S} one has 



P{S^,j = fc} = E 



„ \ 1 / T \ "(l-fc/") 



kin'' " 



= E[<;!>„(r„ 



where 



Now, 



E[(^„(r„)] =E[0(T„)] 

where 0(x) = -^x''e~^ and _R„ = E[(f>n{Tn)] — 'E[4>{Tn)]. It is plain to check that (f>„ converges uniformly 
on every compact set to (j). Moreover, since (r„)„>i converges in distribution, by Prohorov's theorem 
(see, e.g., Theorem 16.3 in 32 ) it should be tight, that is for every e > there exists A" > such 
that sup^>i ^"{1211! > K} < e. Hence, one gets that 

lim < lim [sup \(f)4x)-(f,{x)\ + 2P{\T„\>K}]<2e. 

At this stage, the first part of the thesis follows immediately, indeed {T„)„>i converges in distribution 
if and only if E[/(T„)] E[/(T)] for every bounded continuous function /, and is bounded and 
continuous. 

The second part of the thesis follows by the classical Poisson approximation to binomial 
distribution using Indeed 

Atn = —(1+0(1)) 

na„ 
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and [nan] = m.n with nan +00. To see this last fact, observe that, since T„ converges in distribution 
to T, On goes to zero in probabihty. Using this last fact it is easy to see that EOn ~ /jq j^j 9iin{d6) goes 
to zero, hence na„ must diverge. {> 

Since Q is a mixture of Poisson distributions with weight given by F, the above result can be 
used to "discharge" the choice of 7r„ on the perhaps more intuitive choice of the mixing distribution F. 
Clearly, the emergence of heavy-tailed distributions is not a simple consequence of ([1} , but depends on 
the choice of 7r„. The following example describes a mixing probability which gives rise to a compact 
out-degree distribution. 



Example 1 Take 



-e 



de (7 > 0), 



1 - e-T" 

or, in other words, assume with F{x) — 'ye~'^^dt = 1 — e~^^. With this choice, according 
to Provosition \ 3.l\ the limit distribution of Sn.i is an exponential mixture of Poisson distribution. 
Precisely, we find it to be a geometric distribution, indeed 



lim P{Sn,j = fc} = 7 / 



+00 



^ -(1 + 7)"' (fc = 0,l,...). 



1 + 7 

Moreover, a„ = 1 and A = I/7 satisfies the conditions of in Provosition \3.1l yielding 

lim P{Zn,l = fc} = 

As a generalization of the previous example takes, instead of an exponential distribution, a gamma 
distribution, i.e. 

Jo r(r) 

It is easy to check that the limiting distribution is a negative binomial distribution with parameter r. 
That is, 

r + k-1 



k 



lim P{Sn.j = k} 

n — * + oo 

Moreover, 

lim P{Zn,i^k}= 



{TT^J(^ + ^y' (fc = 0,l,...). 



n — >- + cxz) 



k\ 



In the above example, mixturing the Poisson distribution with exponential weights proves 
insufficient to produce a power-law distribution. In other instances, a suitable choice of F in (O can 
give rise to an out-degree probability distribution with heavy tails. Consider the following 
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Example 2 Assume a slight generalization of i.e. 

n4de) = Z-^9-^g{n9)I(^^^{e)de (9) 
with < ci < (?(r) < C2 < +00 for every r in [0, +00) and 

~ f e''^g(ne)de. 

J a/n 

Note that 0} satisfies {2P ^^ith 



F{x) 



j:^t-Pg{t)dt 

Hence, it is straightforward to verify that Provosition \3. 1\ vtelds 



1 f+'^t''-''e-'g{t)dt 

We now show that such a distribution is a power-law-tailed distribution. In order to prove this, let us 
consider first the special case in which g = 1, i.e. the older With this choice, we get 

Urn P{Sn,j = fe} = ^ / t^-^e-'dt = qc,p,i{k) =: Pc^A^) {k > 0). 

n-. + oo k\ 

Hence, if k > (5, write 

and note that, by the well known asymptotic expansion for the gamma function, 

^ =7a(l + o(l)) ask^+oo. 



Moreover, 



Consequently, we get 



Now note that, since 



T{k + 1) fc-s 



T{k + 1) 



I t''~^e~*dt — o(l) as k ^ +oo. 
Jo 



p,,^(fc)=a'3-i(/3-l)ij(l+o(l)). 



—Pa,l3{k) < qa,l3,g{k) < —Pa,0{k), 
Cl Cl 



k qa,/3,g{k) has power-law tails also for g 1. 

Finally, the following example shows a more complex, already mixtured distribution, leading 
to a heavy tail. 

Example 3 Given a > 1 and s > 1, set, for every positive x, 

■1 r+oo 
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where ^{z, s, a) is the well-known Lerch transcendent, defined as $(2, s, a) := X]fc>o •^'"(o + fc)"", for 
every complex z with \z\ < 1. See, for instance, 9.550 in \27^ . Note that fa,s{x) > 0. Moreover, 
by means of the following integral representation r{s)^{z, s, a) — Jg^°° r^'^e""^'""""^' (e"^ — z)^^dT 
(see 9.556 in '271), one can check that fa.s{x)dx = 1 . In other words, fa,s defines a density 
distribution. Note that fa,s is itself a mixture of exponential densities. Indeed, it can be rewritten as 

/'4-00 S~l — T(ct — 1) 



JO 

r+°° iog^-^(^t + i) 

Jo 



r(s)$(l,s,a)(u + 



with 



-dr = / ^, , ^ ,^ ^TT— — ^-^ — 7-rdu — 1. 



r(s)<E-(l,s,a)(e--l) ^ V{s)^{l, s,a){u + 1) 
It can he verified, with the help of Fubini theorem and the already mentioned integral representation 

of the Lerch trascendent, that for every real q with \q\ < 1, 

C + oo j_k f + oo 



r + oo j_k r + oo 

E(*9) / -ne-' fc,,s{t)dt = e'^'e-'f^,s{t)dt 



^{iq,s,a) ^ j^. i^l)'' 



$(1,s,q) ^ $(l,s,a)(a + fc) = 



(where i :— \j —\), from which it follows that 



— e fa.s(t)dt- 



k\ ■' • ' ' ${l,s,a){a + ky' 
Hence, if one takes an exchangeable random graph Gn, with mixing distribution satisfying with 

Fix) := r 
Jo 

then the limit law of Sn,i is given by 



fa,s{t)dt, 



-oo _i_k 



lim P{S„,i=k}= I ije-'fc.,s(,t)dt = -i>{l,s,a)-\a + k)-' 
for every k > 0. 

As the above examples show, the model can produce graphs with disparate features, depend- 
ing on the choice of the probability distribution of the coin biases. In particular, it is interesting to 
investigate under which conditions do heavy-tailed distributions emerge as limit distributions of the 
out-degree. If one supposes that r„ converges in law to a random variable with probability distri- 
bution function F, we have shown how the question can be reduced to the problem of determining 
under which conditions on F the probabihty defined by ((Tjl has heavy tails. It is worth noticing that 
mixtures of Poisson distributions have been extensively studied (see, e.g., |28]). Let us briefly recall 
some useful properties of such distributions. First of all, if 



Pk 



l.t''e-'dF^{t) (fc> 0,1 = 1,2) 
k\ 
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for two distribution functions Fi and F2 with Fi{x) = for every a; < 0, then Fi = F2, this simple fact 
was first noticed in [23], see also Theorem 2.1 (i) in [28]. Hence one hopes to recover many properties 
of pk '■= /g^°° -g't'^&~^dF{t) from the properties of F. In particular, Theorem 2.1 in [57] states that if 
F has a density / with respect to the Lebesgue measure or to the counting measure, such that 

f{x) = L{x)x°' exp{—f3x}{l + o(l)) as a; ^ +00 

where L is locally bounded and varies slowly at infinity, /3 > 0, —00 < a < +00 (with a < — 1 if 
13 = 0), then 

Pfe = L(fc)/3-(°+')(l/(l + (1 + 0(1)) as fc ^ +00. 

Recall that a slowly varying function L is a measurable function such that 

lim L{xt)/L{x) = 1 

X — > + oo 

for every positive t. Under no assumptions on F we have the following very simple 

Lemma 3.2 Let F be a distribution function with F{x) = for every x < 0, and set pk := 
Io°° ■t\'t^^~^dF{t). Then, for every positive 7 

k'^Pk < +00 

k>0 

if and only if 

I f'dF{t) < +00. 
Jo 

The proof is deferred to the Appendix. 

It is also worth mentioning that a random variable T is a mixture of Poisson distribution 
if and only if its generating function Gt{s) = E(s-^) is absolutely monotone in (— 00, 1), that is if 
G^"' (s) > for every integer n and s in (— cx3,l). See [32] and Proposition 2.2 in !28 . Finally, 
we recall that the sequence {pk)k>i inherits many properties from F. For example, (pfe)fe>i has a 
monotone density if F has a monotone distribution, (pfc)fc>i has log-convex density if F has log-convex 
distribution, {pk)k>i is infinite divisible if F is so. For more details see, for instance, [281 154] . 

The next subsections will deal with the computation of interesting observables that go beyond 
the degree distributions. 

3.2 The hub size 

As a first example of observable, we discuss the size of the so-called hub, i.e. the node having maximal 
out-degree among the nodes (thus, in many concrete networks, being the most important for routing 
and the most vulnerable to attack, see, e.g. 0). The hub size is defined by the expression 

H„ ~ max {S„,i). 
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In particular, the most interesting case for the behavior of the hub is when the tail of the out-degree is 
power-law, as this means that there can be no characteristic size for the hub. As we will explain, it is 
interesting to give an analytical expression of the limit law of this quantity under a suitable rescaling. 
The idea is very simple: by stochastic independence, it is clear that P{Hn < xbn} = (1 — P{S„.i > 
xfen})™", where a; > is any positive number. Now, after setting L :— sup{y > : Urn sup ^[ybn]/n < 
1}, if we can prove that P{S„^i < xbn} = 1 — g{x)/m„ + o(l/m„) for any x < L, then 

limP{//„ < xb„} = e-^^''hio,L){x)+I[L,+oo){x). 

n 

We will show that, in some situations, it is possible to determine explicitly g, b„ and L. The following 
proposition concerns the hub behavior in case of heavy tails for the out-degree. 

Proposition 3.3 Suppose there exist two positive constants r],c-q, a sequence of positive numbers 
{Cr),n)n>i, and a sequence of functions {r„)„>i, such that, for every t in (0, 1) 

TTn{de) = Cr,,„--^ + r„(t), 

(t,i] (nt)'^ 



Cri and 



Bi[bnx] + l,n-[b„x]) °^m„' ^ ^' 



with bn ■- m^" and B{a,f3) ■- it"~i(l - uf'^du. Then 

hm P{H„ < [xbn]} = e-"''""''l[o.L)(a:) + I[l.+oo){x) 

where 



n — f-H-oo 



L := sup{y > : limsnp[y 7nl/^]/n < 1}. 

71 — > + 00 



Proof. First of all let us start recalling the well-known relation 

..hA^r^ ^ - Bi[t.xHl,n^lbnx]) 

where B(a,/3) = f-^il-tf-^dt ^r{a)r{b)/r{a + b). See, e.g., 9.2.5 in [37]. Hence, by (O, ([TTJ 
and Fubini theorem one gets 



■'lO.l] fc=(,b„l + l 



V>-^] fc = [a;(,„] + 



[0,1] io B{[bnx\^l,n~[bnx\) 



nn{de) 



1 



B{[bnx] + l,n- [bnX\ 

with 



F„*W - / nnide)=P{e„ >t}. 

J{t,i] 



12 



Now, by hypothesis 



Then 



Kit) = c„,n-^^ +r„{t). 



> [x6„]} = ; /\[^'-l-''(l-t)"-[''-l-^dt + i?„(:c) 

-B([o„x] + 1, n — [fenx]) n'' 

with 

Finally, using once more the asymptotic expression r{n + a)/V{n + b) = n'^^^il + o(l)) as ^ +00, 
one obtains 

p{s„,r > [xK]} = ^ ( ^"C"r!"!f"t;riTn + 

m„ \ V{n + 1 - 77)r([6,ixJ + 1) 

+ 0(1) 



which is 

P{5„.>[xM} = ;^[5+o(l)]. 



We give now two simple conditions that imply the validity of (|10p . and can be useful in 
concrete applications. The first conditions will be used in the example that we spell out in detail in 
the second part of this paper (Proposition 17. 2p . 

Lemma 3.4 If for some a > 0, C < +cx3 and rj > 

\r„{t)\ < C {I{nt < q}(1 + (nf)-") + n"") (12) 

then if J Op holds true provided that nin is such that nin/n^ — o(l). 

Proof. Set f3n ■= [xbn] and 



I /•a/n 

'B{Pn + lTd,n-l3n 



i-id) = ^n ^,, , , - tr-^--'dt. 



Hence, 



I 1 ra/n 1 

m„\R„ix)l < Cm. [^^f^-^j;^J^ d + - t^'^^'^dt + 

<C{/„(0)+J„(-r,)/3-''+o(l)}. 
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It remains to show that /n(0) + /n(— '?)/3n = With the help of the Sirhng formula, one has 

' r(/3„ + 1 + d)r(n - /?„) ' 

m„r(n + l + d) 

< C,m^ + 1 + _ ^„)n-,„-i/2 ^^P{^(" + 1 + d) + /3„ + 1 + d + n - /34 

exp{log(a)(l + d + AOKl + i±ii)"(i+^) 

= Cinin 



< C2m, 



exp{log(a)(l + d + - log(/3„)(l/2 + d + /?„)} 



[(1 _ &.)n/0„-|;3„-;3S/n 

< Csmn exp{log(Q)(l + d + /?„) - log(/3„)(l/2 + d + /3„) + (/3„ - pl/n)} 

< C4exp{log(m„) - C5/3„log(/3„)}. 

Since /3n = a;"'^''''m^/''(l + o(l)) and ml/^ /n = o(l) the thesis follows easily. <0 

We conclude this subsection observing that when (|3]) is in force, then 

f , ,n^ . ^ / X F(n) - 1 + 1 - F(nt) 
J{t,i] 

hence it is natural to assume some hypotheses on 1 — F{x). In particular, recall that a distribution 
function F is in the domain of attraction of the extreme value Frechet distribution if and only if 

sup{x : F(x) < 1} ^ +00 and 1- F(x) ^ —L(x) (13) 

x^ 

where L is a slowly varying function, see [26]. This means that (1131) holds if and only if given a 
sequence of independent and identically distributed random variables (C)n>i with common law F 

lim P{a^^ max{^i, . . . ,i^n} < x} — e^'^''^ (a; > 0) 

n — '-I-00 

for a suitable normalizing sequence (a„)„. In point of fact p3|) is not sufficient, in our case, to ensure 
that r„ is a reminder of the right order. Hence, we need a heavier requirement. 

Lemma 3.5 Assume that {21) is in force with 

l--Fix) = ^[l + hix)] (14) 

for some rj > and < < +00, with 

\h{x)\ < A + -ij^ {x > 0,A < +00,61,82 > 0), 

then flUp holds true with m„/n^ — o{l). 
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Proof. Assume that Si = 5 and 82 = S. In the same notation of the proof of Proposition 13.31 

Crjh{nt) 



r„it) 



{l + h(n)) + 



Now 



Rn Rn(x) = 



B{[br,x\ + 1,71 - [br,X\) J ,q 



with 

(1) _ c„(l + h{n)) 



and 

Finally 



F{n)n^ B{[hnx\ + 1, n - [h^x]) J^o^^ 

I f ^[6„3!]q ^.n-[t,„x]-l Cr,h{nt) 

B{[br^x] + l,n~[b„x]) J^o,^ ^ ^ F(n)(nf)'' 



(2) 



|i?i^'l< 



n''+''B([fe„a;] + l,n- [6„a::] 



i[0,ll 



= c„2yl 



= c„2A 



B{[bnx] + l-ri-5,n- [bnx]) 
n^+^Bilbnx] + l,n-[b„x]) 
r{[b„x] + l-ri-S)r{n + l) 



" n''+T(n + 1 - jy - 5)r([&„x] + 1) 



for a suitable constant A', hence, since b^ — rrin, \RI^^\ = o(m~^). The general case follows in the 
same way. 



Example 4 an example it is easy to see that 



1 - F(x) 



(a; > 0, Q > 0, 77 > 0) 



(q + x)'^ 

satisfies the assumption of the previous lemma. In point of fact 

"a;" - (a + x)" 



1-F(x) = — + — 



hence 



\h{x)\ < 



(a + x)" - x" 



x^ 



Ifx < 1, then \h{x)\ < (1 + a)''/x'', while if x > 1 



IM^)I<(1 + -)"-!■ 

X 

Since t ^ t^ is a Lipschitz function of constant r]{l + a)^^^ on [1, 1 + q] , if x > 1, it follows that 
(1 + ajxy — 1 < 77(1 + a)''^'^|l + q/x — 1| = 77(1 + ctf^^^ajx. Summarizing 



\h{x)\ < A 



1 1 

X x^ 
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4 Some non-local features of the graphs 

In this section, we deal with the subgraphs content and the mean number of roots and leaves of the 
model of Subsectionl2.1l 



4.1 Subgraphs 

The simple exchangeable structure of the generated random graphs makes it possible to compute 
easily the mean value of the number of subgraphs "of a given shape" contained in the graph, that can 
be used for the discovery of "network motifs" [531 1411 1421 138] . 

Consider a subgraph, with k nodes and m edge, given by 

H = {il ^ ^(1,1): *(l,mi),«2 ^ ^(2,1), . . . ,ik ^ ^ik,!), ■ ■ ■ ,ik ^ »(fc,mfc)} 

with X^*Li mi = m. Of course 

P{H£G„}=[ er^n(rf^i) / 0^^7v4de2)... f e™''7r„(d6lfc). 

■'[0,1] J[0,1] "'[0,1] 

Denote by T the set of all subgraphs isomorphic to H contained in the complete n graph and by 
N{H) the cardinality of such set. Since the number NH{Gn) of graph isomorphic to H contained in 
G„, can be clearly written as 

A/'H(G„) = ^I{gGG„}, 

it follows that 

E[ArH(G„)] = N{H)P{H G G„}, 

indeed by exchangeability P{g G G„} — P{H G Gn} for every g in T. 

For example, let us consider the fc-cycles. A subgraph H is called fc-cycle if it has the form 
ii — * 42 ^ • • ■ ^ jfc ^ ii- If A/cfc (Gn) denote the number of fc-cycles contained in Gn, then 

E[AAcJG„)] = 2QMn. (15) 

Things are slightly more complicated for rectangular matrices because in the evaluation of 
N{H) one needs to take into consideration also the constrains given by the fact that only m,i nodes 
can send outgoing edges. In what follows we will discuss mainly the case of square matrices. 

As we shall see, in the study of transcriptional networks, the 3-cycle, ii — ^- 12, 12 — > is, is — > ii 
is called "feedback loop" (fbl), while, with "feedforward loop" (f f 1) one means a triangle of the form 
«i — > *2 ^ is, ii — > 13. Following the procedure described above, one gets 



E[Mbi(G„)] = 2( g l^t (16) 
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As for the evaluation of feedforward loops, we have 

E[Mfi(G„)] = 6(!') / 9\„{de) f 9n„{de). (17) 

^[0,1] ^[0,1] 

It is worth mentioning that in principle it is possible to compute analytically the variance, as 
well any other moment of the number of subgraphs isomorphic to a given subgraph. However, com- 
putations become lengthy and cumbersome rather soon. As an example, we considered the variance 
of the number of feedback loops and feedforward loops. 

The key point is evaluating EA/'ffi(Gri)^ and EA/fbi(Gn)^. Again, for the sake of symplicity, 
we will deal only with square matrices. It is clear that EA/'fbi(Gn)^ ~ Stex Ssgi ^{S; ^ ^ G„}, 
T being the set of all feedback loops contained in the complete n graph. Analogously one obtains 
EA/f 11 (Gn)"^ taking as X the set of all feedforward loops. Simple calculations give 

E[AA.i (G„)^] = 4 (^^ -'yi + 12 (^^ - 3 j 

+ 6(n - 3) {nl62,n + ulSl^) + 2 ^ {„l + Sl) 

where 5f,n := Jq G'lTnidO). As for A/fti, the computations axe longer, but essentially the same. The 
problem is that P{s,t € G„} can take many different expressions depending on s and t. With 
straightforward but tedious calculations one gets 

n^MGnf] = (fj An + {n- 3) (^^ B„ 

n\ I n — 3\ ^ I n\ I n — 3\ ^ 
3 2 F"+ 3 3 l^'' 



with 



= 6Si ,n<^2,n + 3.5|.„ + .51 

Bn = 305l,„(52,„ + 18(5i_„(52,„ + 6(5|_„ + 18di^„S3,n + 12(5l,n<52,n<53,n + 6S2,nS3,n + 3^3,, 
C„ = 605?^„(52,„ + 12(52,n + 245l,„52,n53,n + 125i^„(54,n, 
Dn = 36(5i „(52,n- 



Hence, 



Var(AAfbi(G„)) = 12 ( ['^ U \ |Mn52,n 



+ 6(n - 3) I 3 {nlS2,n + ulSl^) + 2 r (m^ + 52^) - 4 P ) i?„p« 



(18) 
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and 



V^ar(Mfi(Gn)) = ( 3 + (n - 3) g ] B„ 



(19) 



with i?n = [Q) - (V)] 



4.2 Roots and leaves 

We say that i is a root if there is no edge of the kind j ^ i but there is at least one edge of the kind 
i —> j with j 7^ i. Loops do not count. Conversely, we say that i is a leaf if there is no edge of the 
kind i ^ j but there is at least one edge of the kind j i with j 7^ i, again we exclude loops and 
isolated points. Let £(Gn) be the number of leaves in Gn and SH(G) the number of roots in Gn. Of 
course, £(G„) = S,i{G„) and fH(G„) — JH,;(G„) where £.i{G„) is equal to 1 if i is a leaf of 

Gn and otherwise and, similarly, SHi(G) = 1 if i is a root of G„ and otherwise. It follows that 



analogously. 



Hence, 



and 



SH,(G„)=l|f]x,., =ol (l-lJ ^^'^=^ 

i:«(G„) = =ol [i-i| ^j.' = o 



E[£,;(G„)] = (1 - A*„)"'" (1 - P{Sn-l,^ = 0}) (20) 

E[$R,(G„)] = (1 - (1 - /x„)'""-')P{S„,. = 0} (21) 



and then 

E[£(G„)] =n(l-Mn)'""(l-P{5'„-i,, =0}) E[%(G„)] = m„(l - (1 - ^i„)'""-i)P{5'„,i = 0}. 



4.3 Connected components 

One of the classic and most studied problems in the mathematics of random graphs is the existence 
and the size of the so-called giant component (see, for instance, refs. [Ill 1171 [T4] . the books [161 119) 
and references therein). This is in principle an important property if one wants to use the ensemble 
as a null or positive model for a real-world system. In many empirical instances, such as the Internet, 
World-Wide Web, and many biological networks, the existence of a very large component can be 
observed directly. For this reason, if this property is absent a model could have limited applications. 
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Of course in our model the existence of a giant component depends on the choice of the measure 
7r„. A detailed study of this problem is beyond the scope of this work, and it will be dealt with in 
future papers. While, for the moment, we did not prove any general theorem, in some interesting 
case, such as the power- law model defined by ((2)1, one can study the problem numerically. In this 
case, our simulations indicate the emergence of a giant component for all values of the parameters, 
which makes the model attractive for applications (see also |6J). On general grounds, it is not hard to 
see that for this example the probability that Gn has only one connected component goes to zero as n 
diverges (at least for /? > 2 and square matrices) . This is a consequence of a more general proposition. 

Proposition 4.1 Let m„ = n and assume that lim„^+oo (1 - fln^-^P{Sn.^ = 0} = a > 0, then 

lim P{Gn is connected} = 0. 

rt — ^ + oc 

Proof. If Y{n) = Er=i with = I{Sn,^ = 0, Z„,, = 0}, then 

PiC is connected 1 < PiY(n) - Q\ < ^"''(^^^^^ - 1 ^(^W)' 
IS connected | < P{Y{n) - 0} < -^;^yj^ - 1 " ^^y^n))- 

Since E(F(n)) = nE{Yi^„) = nP{S„,i = 0, = 0} and 
E{Y{nf) = nE{Yl„) + n(n - l)E(yi,„y2,„) 

= nP{S„,i = 0, = 0} + n(n - l)P{S'„,i = 0, = 0, S,,,2 = 0, Z„,2 = 0} 
= n(l - iJ.-aT'^P{S-a.i = 0} + n(n - 1)(1 - Mn)'""'P{S„,i = 0}', 

we get 

P{G„ is connected} < 1 ^- ? 7 ^"^'""'f ^"^r' ^ r- 

^(l-Mn)2"-2P{S„.i =0}2 + i(l-M„)"-ip{5„.i =0} 

Taking the limit for n — > +00 gives the thesis. <0 



5 Threshold properties in the kernel of 

Another interesting facet of the exchangeable graph ensemble is its connection with the theory of 
systems of random equations over finite algebraic structures. 

This problem has fairly important applications in the theory of finite state automata, the 
theory of coding, cryptography and combinatorial optimization problems (satisfiability, colouring). 
This kind of problems arise in many branches of science, ranging from statistical physics (theory of 
glasses) to information theory (e.g. low-density parity- check codes). See, e.g., [211 1221 l33l 1361 l39l 1401 

[MES]. 

One interesting problem in random linear systems over finite algebraic structures is to prove 
a threshold property for the random graph Gn with adjacency matrix X„ of dimension m„ x n. More 
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precisely, one aims to prove that if m„ and n diverge with n/rrin ^ 7 < 1, then an abrupt change in 
the behavior of the rank of the matrix X„ occurs when the parameter 7 exceeds a "critical" value 7c. 
This property can be expressed in terms of the total number of hypercycles in G„ defined as 

5(X„) = 2'^"''<''"' - i = 2""'""AA(X„) - 1 (22) 

where A/'(X„) is the number of nontrivial (i.e. non zero) solutions of the linear system in GF2 (the 
field with elements and f ) 

X^x =G¥2 0. (23) 
Problems of this kind have been extensively studied for a few ensembles of random graphs, see, for 
instance. Theorem 3.5.1 in |33) and Theorem 1 in [35| . 

In the next proposition, we give an exact expression for the mean value of the number 
of solutions of the linear system (|23|) . This expression can be used to prove the existence of a 
threshold property for 5(X„). Moreover, the same expression is a first step for a more exhaustive 
characterization of solution space, which shall be dealt with in a forthcoming paper. All the proofs 
of this Section are deferred to the Appendix. 

In order to state the next proposition introduce the following notations. Define 



en(z) = / (1- 261) V„(d6() 

J\0,1] 



and 

z„^{je{0, 1, n} Un(j) = 0}. 

Proposition 5.1 Assume that X„ is a random adjacency matrix of dimension m„ x n with law (J\) . 
Then 

EAA(X„) = 2-"fj Q(l + en(j))™" (24) 

whenever Zn is the empty set. 

Using the previous result one easily obtains the following large deviation estimate 

Proposition 5.2 Ifnin = [^] (7 < 1) and {T„)„>i converges in distribution to a random variable T 
with distribution function F, then 

lim ilog(EA/'(X„)) = sup 6.^(2;) 7.^ 

n-. + cx) n x€[0,l] 



with 



7 



(-1 



log I 1 + / e-^'^'dFit) ) - 7 {xlog(x) + {l-x) log(l -x)+ log(2)) 



'[0, + oo) 

whenever Zn ts the empty set for n large enough. 
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Combining the previous result with (|22|) it is clear that, under the hypotheses of Proposition 
15.21 the mean number of hypercycles EiS(X„) can be written as 

E5(X„) = (1 + o(l))(2"-™"e"^-P„ - 1) = (1 + o(l))(exp{n(/^ - (I/7 - 1) log(2)}P„ - 1) 

where P„ is a function of n which is at most polynomial, i.e. i log(P„) — o(l) (as n +00). Hence, 
if > (1/7 — 1) log(2), it follows that E5(X„) diverges exponentially in n as n goes to +00 , while 
if I-y = (1/7 — 1) log(2) it is sub-exponential, that is for some fe > 0, E5(X„)/n'' goes to zero as n 
diverges. 

In point of fact we have the following 
Lemma 5.3 // J^^ tdF{t) < +00 then 

sup e^(a;) > e^(0) = f i - 1 ) log(2). (25) 

^£[0,1] \7 / 

If 

log(x)f/ te-^^'dP(f)) =0(1) {x^O), (26) 



then there exists a 7c such that for any 7 < 7c 



sup e^(a;) = e^(0) log(2), (27) 

e[o,i] \7 / 



a;e[0,l 

while for 7 > 7c 

sup e^(a;) > e^(0) = ( i - 1 ) log(2). (28) 

In particular, if 

1 - F[t) = t-'^Lit) (29) 
with < /3 < 1 and L a slowly varying function then Ii2b]) holds true. 



In other words, under the hypotheses of Proposition 15.21 if (|26p holds true, then there exists 
a constant < 7c < 1 such that 



lim E5(X„)/n' 

n — * + oo 



for some b — 6(7) > if 7 < 7c 
+00 for every 6 > if 7 > 7c 

That is, the above mentioned threshold property holds. 



6 Other Models 

In this short section we give some comments about the other two models presented in Subsections 
12.212.31 
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6.1 Completely exchangeable graphs 



Most of the properties and quantities discussed above can be easily established for the totally ex- 
changeable case. Again, := Q{x|"' = 1} = Jj^ 6iT„{d6), and, for instance, the degree distribu- 
tions (for a square adjacency matrix), are given by 

Q{Sn,i = k} = Q{Zn,j =k}= Q ^ ^ e\l - 

Hence, for instance, we have the following 

Proposition 6.1 If {Tn)n>i converges in distribution to a random variable T with distribution func- 
tion F , then, for every integer j > 1, 



lim Q{Sn,j = fc} = E 



n— >+cx3 



and 



lim Q{Zn,j = fc} = E 



n—*+oo 



}_j.k -T 

fc! 



-f 

Jo 



fc! 



e~^dF{t) (fc = 0, 1, . 



/• + 00 ^fc 



e~*dF{t) (fc = 0, 1, . . . ). 



For this model, quantities such as the mean number of subgraphs, roots, leaves, are again 
easily computed analytically along the same lines described above. For example, for motifs 



Q{H e G 



„} = / e^-n 

J [0,1] 



{dd). 



when 



-ff = {il — » • • • ,«1 — »■ «(l,rrn),*2 — » «(2,1), • ■ ■ Jk ^ *(fc,l)i • ■ ■ Jk ^ Hk,mk)} 

with X)*L]^ rrii = m. Hence, 

Eq(JVW(G„)) = N{H)Q{H € G„}. 

Finally, throwing triangular matrices with the same algorithm, one can easily generate models for 
undirected graphs. 



6.2 Hierarchical models 

One interesting use of this variant is that it can be exploited to produce directed graphs having 
power-law tailed in- and out-degree distributions with different exponents. To illustrate this point, 
we will consider the following example. 

Example 5 7/ 7 > /? > 2, yl > 0, X„{da) oc I[A,n/2]Q: 'da and n„{d9\a) oc I(^ct/n,i]^~^d6, then 
Q*{Sn,i = fc} = cifc~''(l -|-o(l)) andQ*{Z„,i = fc} = C2k~'^ {1 -\- o{l)) . Indeed, it is easy to check (by 
means of a usual dominated convergence argument) that 

lim Q-{^„ , = fc} = r- r- a^-^-H^-^e-Utda 

fc! Sa So. 



11 



and, moreover, 

— / / a" ^ t "e dtda= ^PA.pik) ^PA,j{k). 

In the same way it is easy to check that liiiife^+oo Q* {Zn,i ~ k} = Pu,~f{k) with u = A[f3 — l)/(/3 — 2). 

7 A simple two parameters model 

In this section, we focus our attention on random graphs generated by assuming ([2)| and we shall 
specialize the results of previous sections to this two parameters model. This model has been suggested 
by a biological application. Hence, before presenting the results, we briefly recall the main features 
of a transcription network. 

Transcription networks are directed graphs that represent regulatory interactions between 
genes. Specifically, the link a b exists if the protein coded by gene a affects the transcription of 
gene b in mRNA form by binding along DNA in a site upstream of its coding region 4 . For a few 
organisms, such as E. coli and S. cerevisiae, a significant fraction of the wiring diagram of this network 
is known [341 1291 1521 [30] . The topological features of the graphs can be studied to infer information 
on the large-scale architecture and evolution of gene regulation in living systems. For instance, the 
connectivity and the clustering coefficient have been considered |29) . For this kind of analysis one 
has to consider null ensembles of random networks with some topological invariant compared to the 
empirical case. The idea behind it is to establish when and to what extent the empirical topology 
deviates from the "typical case" statistics of the null ensemble. For example, a topological feature 
that has lead to relevant biological findings, in particular for transcription, is the occurrence of small 
subgraphs - or "network motifs" gH |3H1 . 

As usual in statistical studies, the choice of the invariant properties for the randomized coun- 
terpart is delicate. For instance, the null ensemble used to for motif discovery usually conserves the 
degree sequences of the original network. The observed degree sequences for the known transcription 
networks roughly follow a power-law distribution for the outdegree, with exponent between one and 
two, while being Poissonian in the indegree [291 [18] . These features suggest to consider also alterna- 
tive null models for directed random graphs with poisson in degree distribution and (approximately) 
power- law out-degree distribution, which can be easily generated with our model under ((Sjl. In the 
remainder of the paper, we will discuss this case in more detail, showing explicit calculations of the 
observables discussed in the previous sections. 

7.1 In and out connectivity 

By simple calculations from ((2)1, we get 
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If 1 < /? < 2, then 



Of , , logn ,,,, 

/i„ = (log n — log Qj = all + o(l)| as n ^ +oo. 

n — a n 



If /3 = 2, then 



If /3 > 2, then 



The next proposition, which is a consequence of Proposition 13.11 shows that Sn,i is asymp- 
totically power-law distributed. While Zn,i, at least with a suitable choice of m,i, is asymptotically 
Poisson distributed. One has to distinguish among the different possible scalings for /i„. More pre- 
cisely, we have the following 

Proposition 7.1 Assume that |3) holds true. Then, for every a > Q and P > 1, 

lim P{S„j = fc} =p,,^(A:) (j>0,fc>0). 



n — ^ + 00 



Moreover, if 13 > 2 and m„ = [5n] (S G (0, 1], [y] being the integer part of y) 

-A \fe 

lim P{Zm„,j ^k} = —— {j > 0, fc > 0), 



where A = ^"^/_~)^ ■ If P = 2 and m„ = [Sn/ log(n)] 



lim P{Zm^,, = fc} = " ''"(^")'' > 0, fc > 0). 



Ifl</3<2andmn = [Sn^^'^] 



— X \ k 

lim P{Z„„,, = fc} = {j > 0, fc > 0) 

where X = '^"^(2-;f)~^^ • 

It is worth noticing that asking for a degree distribution that brings to an outdegree having 
a power-law tail with divergent mean (/9 < 2) poses a heavy constraint on the number of regulator 
nodes (the rows of the matrix) . 

7.2 Subgraphs 

We will discuss mainly the case of square matrices, where calculations are simpler and conceptually 
equivalent. 
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7.2.1 fc cycles 

Under Q, using (HSI, if /3 > 2 



lim iE(Arc,(G„)) = -i 

n— » + oo I k\ 



q(/3-1) 



(/3-2) 



if /3 = 2 



and if 1< /3 < 2 



7.2.2 Triangles 



1™ ^ ,, E(AAc,(G'„)) = ^ 
n^+oo 2(logn)'= ^ '-^ " /s! 



lim 



The feedforward loop is a classical example of "network motif", i.e. it is overrepresented in known 
transcription networks. Conversely, feedback loops (which in principle could form switches and oscil- 
lators) are usually underrepresented ( "anti-motifs" ) in transcription networks [531 142) . 

Here, we evaluate, for our model, the mean number of feedback loops versus feedforward 
loops. Under ((H, dTT]) yields 



EMfl(Gn) =6 



3 ; [(n/Q)/'-i - 1]2 



□ 2-/3, 



OL / n 



d0 1 e^-^de. 

a/n 



Hence: 

If /3 > 3, then 



lim E(A4fi(G„)) = 



n — > + cxD 



> 3 lim E(Mbi(G„)) 



If /3 = 3 
If 2 < /? < 3 
If /3 = 2 

Finally, if 1 < /3 < 2 



lim 



(/3-3)(/3-2) 
lim — !— E(Mfi(G„)) 

1 



w - 2r 



E(Mfi(G„)) = 



(/3-l)(3-/3)- 



lim 



lim 

n^-t-oo n log n 
1 



E(Mfi(G„)) = 



n— > + oo n' 



E(Mfi(G„)) = 



(3-/3)(2-/3)- 
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At this stage one can give the scaling behavior of ratio of the mean number of feedback and 
feedforward loops, which is 



EA/'ffi(G'„ 
EMbi(G„ 



n/(log nf 

log n 
A 



if 1< /3 < 2 

if /3 = 2 

if 2 < /3 < 3 

if /3 = 3 

if /3 > 3 



where A = 3(/3 - 2)^(/3 - 3)"^(/3 - 1)"^ > 1. Here and in what follows we use a„ ~ 6„ to denote 
ffln = b„(l + o(l)) as n ^ +oo. Thus, the ffl always dominates, although there is a wide range 
of regimes. Note that the dominance of feedforward triangles is even stronger if one considers the 
rectangular adjacency matrices discussed above. For example, for 1 < /? < 2, and rectangular matrices 
with m„ = n^~^, we calculate 

EMfl(Gn) 

EMbi(G'„) 

As for the variances, for instance, one obtains 



\/ar(Mbi(G,0) 



^5(2-/3) 



3^"' p-2' 



if 1< /? < 2 

if ^ = 2 
if /3 > 2. 



7.3 Roots and leaves 

By simple computations, from (I21|l we obtain: 
If 1 < /3 < 2 then 



lim ■ 



and hence E($Hi(G„)) 
If /3 = 2 then 

and hence E($Hi(G„)) 
If /3 > 2 then 



-fcirv/'-lr,2-/3 



lim ; 



log[E(lH.(G„))] = - 



/3- 1/3-1 
a 

2-/3 



logn 



log[E($H,(G„))] = 



limE(fH,(G„)) = (1 - e-F^")p<,,^(0). 

Analogously, from (|20|l . we derive: 
If 1 < /3 < 2 then 

lim i log[l - E(£.(G„))p<.,^(0)-'] = - f - 1 ^^-s-i 



and hence E(£,(G„)) ~ (1 - e"^"""'"-'"" )p„,^(0). 



2-/3 
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If /3 = 2 then 



lim J— log[l - E(£.(G„))p„.^(0)-i] 
n log n 



and hence E(£4G„)) ~ (1 - ;irK,/3(0). 
If /3 > 2 then 

limE(£,(G„)) = (l-e-f^")(l-p,,^(0)). 

n 

Combining all the previous statements, we get E(£(Gn)) ~ n while 
E(SH(G„)) ' 



^-x^n^-" [n</3<2 
n^-" up = 2 

n if/3 > 2 



where = §E^a^-'. 

In concrete applications, these properties can be used for example to impose a well-defined 
scaling for the roots-to-leaves ratio of the null network ensemble. 

7.4 The hub 

In Section r3.2l we have already explored the implications on the limit laws of the maximally connected 
node of a power- law distributed out-degree. Using that results under ([2]), it is possible to prove an 
explicit limit theorem for the size of the hub. 

Proposition 7.2 For /3 > 2 and for every positive number x 

hm P{//„/6„ < a;} = e-'"''")''"' (30) 
with ran = ri and bn = . For (3 = 2 and for every positive number x 

lim P{//„/&„ <x} = e-'"/"'""' 

n — ^-t-oo 

with m„ = bn = n/logn. Finally, for 1 < P < 2 and m„ = n'^~^ , 

lim P{H„/n<a;} =e-<"''"'''"'l(o,i)(a;)-Hn,^)(a;) 

for every positive x. 

Remark 2 (a) Recall that e"'"'''^-''' I[o,+oo) (a;) is the Frechet type II extreme value distribution, that 
is one of the three kind of extreme value distributions which can arise from limit law of maximum of 
independent and identically distributed random variables. 

(b) Note that in the last case the limit distribution is not exactly of extreme value kind and 
the probability of finding a hub of size n is asymptotically finite and equal to 1 — e"^"^'' . This 
concentration effect was already noted in [31] for another kind of random graphs ensemble. 
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Proof of Proposition \7.S\ Let /3 > 2. In the same notation of the proof of Proposition 13.31 

^ ^ 



F:{t) = l{a/n < t} ^_f_\_, it'-^ - 1) +I{a/n > t}, 



hence rj — l3 — 1, 

_/3-l 

^0-1 

I — P 

and 



r, 



71/3-1 1 _ (q,/„)/3- 

The thesis follows by Proposition 13.31 and Lemma [3.41 noticing that 



\rn{t)\ < C {( j^, + >t} + ^) 



Arguing essentially in the same way one can prove the statements for /3 < 2. 

For /3 > 2 one can guess that E[_ff„] ~ claimed in [31] in the analyisis of another 

scale-free random graph ensemble. In point of fact, we have the following 

Proposition 7.3 If j3 > 2 and d is such that 13 — d> 1 then 

lim nn-'"^^-'^Ht] = (/3 - l)^Q2r ( tll^ 



Proof. We begin with the case d — 1. In Proposition 17. 21 we have just proved that (Y,i),i>i := 
{Hn/n'')n>i converges in distribution with 7 = l/(/3 — 1). So, it is enough to prove that {Yn)rL>i is 
uniformly integrable, i.e. 

lim supE[lF„jI|y„|>i] = 0. 

L — * + oo ,^ 

See for instance Lemma 4.11 in [3^. Note, first, that 

/ + 00 
(1 - P{H„/n^ < x})dx. 

Now by pO)) 

LP{H„/n'' >L}< CiL(l - e""""'^'"") 

for a suitable constant Ci. Hence limL_>+oo sup„ LP{Hn/n' > L} = As for the second term, setting 
Psni^) — P{Sn < x}, one has 

1 - P{i/„/n^ < :e} = 1 - [Fs„(a;n^)]" 

= 1 — exp{n log(Fs„ (sri^))} 
[using 1 — < — a;] 

< -nlog(l- (1-Fs„(xn^))) 

< (1 + C2)n(l-Fs„(a;n^)). 
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Hence, 

p-\-oc r-\-QO 

/ (1- P{Hn/n'' < x})dx < {I + C2) n{l~ Fs„{xn'^'))dx - 

J L J L 



I L 

/3-2 

Since 1 — Fs„ (xn?) = if xrP > n, that is if a; > n'^^ , 



Now, if L > (/3 — then [a;n^] + 1 > for every x > L, hence 



I„<C3n-^ l^]B{n-k + l,k^P+l)dx 

^ 2-p r(n + i) ^ r(fc + 



fc=[a:nT] + l 



at least for L large enough. Since, 
it follows that 



In<C4 I ( 1 dx< C5- 



13-2 ■ 



The proof of the case with d > 1 follows an identical procedure, with x^^'' in place of x and L^/'* in 
place of L. 

7.5 Random linear system in GF2 

Under one has 

F{x) = a^-\p - 1) ^dt = (^1 - ^) (a; > a). 
Hence, applying Lemma [5.31 one has that, if 1 < /3 < 2, then there exists a constant 7c (/3) such that 



lim ES(K,,)/n 

n — ' + 00 



for some b = {1(7) if 7 < 7c(/3) 
+00 for every 6 > if 7 > 7c(/3) 



While if /3 > 2 no threshold property holds since J^°° xdF{x) < +00. 



A Appendix 



Proof of Lemma \3.S\ Let k > j, k being an integer. By hypothesis 

r(k + l) f+°° t^-T" -t , 



r(fc-7 + i)'^^ ^ r(fc-7 + i) 
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where G{x) = J^^ t''dF{t). Summing both sides on k, one can write 

Af+H+i p., .+00 

fc=[7l + l ^ ' ' ° 

with 

and v-y := [7] + 1 — 7. Hence, 

E ^ ^,,M{t)e~'dG{t). (31) 

Now, for every t > Q and in (0, 1), by 5.2.7.20 in 48], one has 
Jirny-„,,{t) = E r(m + ., + l) = 

m — 

where 

1 

r(;^7) Jo 

Moreover, (j>-y,M{t) > and the convergence is clearly monotone. Hence, taking the hmit as M goes 
to +00 in (|31[) . by monotone convergence one obtains 

with /+°° g{u^,t)t'<dF{t) < +00 if and only if r(m + [7] + 2)r(m + + l)-ip„+i+[^] < +00. 

Now, since g{i/j,x) is a distribution function, one has J^'^°° g{i'^,t)t'^dF{t) < +00 if and only if 
J^°° fdFit) < +00. Moreover, since r(m + [7] + 2)r(m + ly^ + 1)"^ = (m + [7] + l)>,„+i+[.^] (1 + 
0(1)) as m +00, Em=o + [7] + 2)r(m + + l)"^p,„+i+[^] < +00 if and only if EX=o("^ + 
[7] + Pm+i+i-y] < +00, which proves the lemma. 

Proof of Provosition 15. Jl Denote by M(m„, n) the set of all m„ x n-adjacency matrices. 
The number of solutions of linear system =gf2 is defined as 

AA(X,0 = E =GF2 0}. 



Now note that 



1 + (-IY 



and write 



EAA(X„)= E ^{Xn = A4 E n 9' ■ 



A„eA/(m„,n) xeGF™"J = l 
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Using IT} rewrite the last expression as 
EAA(X„) 

„ rm-fi 



n (i + 



= 2" 



E 



where {A„)j — {(^n)ij, ■ ■ • , (^n)m„i} and {A„)ij is the element in position {i,j) of matrix An- Since 
the above expression in square brackets is independent of j, EA/'(X„) can be written as 



E7V(X„ 



= 2 " 
At this stage note that 

and then 



[0,1]™" 













" E ( 









aSGF"'" »=1 



^,™™„ i[0.1]" 



1+ E (-i)^-""'^-nc'(i 



2-" ^ 



and after summing over we have 



1 + n E ((-ir^.r^i-^o'-"- 



EAr(X„) = 2-" ^ / 



(32) 



Now, using 



I{a; =GF2 0} = 



l-(-l)^ 



where a: = a; + 1 in GF2, expression (I32|) can be written as 



EA/-(X„)=2-" ^ / 



1 + n (1 - 26ia{i, =GF2 0}) 



Moreover, since 



(1 - 26, l{x, =GF2 0}) = (1 - 200'^'^'^''"^^°' 
we can rewrite the mean number as 



EAA(X„) = 2-" V / 



„ ■^[0,1]'^ 



1+n (1-2^0'^"'"''^=°^ 



31 



After the expansion of the last square bracket we obtain 

EAA(X„)=2-"X r n E / ^4^^0(1-2^0'*^'^''"^"*' 

J = l V } «=l:rieGF2'^[0'll 
= 2'"E " n E ?n =GF. Ob) . 

Finally, it is easy to see that the last sum is independent of i. Then 



(H'^ =GF, 0}j) 

t7gGF2 



<> 



Proof Provosition \5 .'A First of all, observe that 



exp{n'i/'n(j7n)} 



where 



2r„ 



= log(l + U^n)) = '-^ log(l + E[(l - ^)-"]). 
n n n 



Now recall that one of the most classical example of large deviation estimate is 

T7log(EpV''^""^"') = [/W-{^logW + (l-^)log(l-x) + log(2)}] 

whenever \\vciM^+ao sup^gjQ \ fM{x) — f(x)\ — 0, f being a continuous function on [0, 1]. See, e.g., 
Theorem 7.1 and 10.2 in [20) . Hence, the thesis follows if we prove that for every K < +oo 



(33) 



lim sup IVn(x) - - log 1 + / e-^"dF{i) 1=0 



[0, + oo) 



To prove (|33p it is enough to prove that for every K < +oo 

Im sup E l-—— -e = 0. 



(34) 



Since Tm converges weakly to T and e ' is a bounded and continuous function on [0, +00), then 

lim Ele"^"^" -e"^"^! =0. (35) 

Moreover we claim that 



lun El(l-^)*^-e-^^-l 



0. 



(36) 



To prove this last claim, set (^„(a;) = (1— ^)" and note that converges uniformly on every compact 
set to . Hence, given K, 

lim sup |<^M(a;) — e~^| = 0. 

M-. + OO |,t;|<A' 
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Moreover, since {Tm)m>i is tight, for every e there exists K > such that supJ^J>J^ P{\Tm\ > K} < e. 
Now 1(1 - - e-2^^f I < 2 and then 

lim E\{l~^f' -e-^'^"'\< hm [ sup \(I>m{x) ^ (l>{x)\ + 2P{\Tm\ > K}] < 2€. 

A/— > + cx) M M^ + oo |3;|<x 

That is (|36p . Finally, given a, 6 in [—1, 1] and a; > 

|a^— 6^|< sup \—y^\\a — b\=x\a~b\, 
t/e[-i,i] "-y 

hence, since < Tm < M, one has — 1 < 1 — 2ZiL < i and then 



|E[(1 _ _ g-2T.j, < - _ e-2TM-| + E\e-'''^''^ - e'^^^l 



(37) 



Combining (|35l), ((Ml) and jSZl we get ((Ml)- 

Proof of Lemma \5.3[ Note that, for every x in (0, 1), 

= l + V+^)e-^^^dF(t) - Tlog^ + 7log(l - X). 

Hence, if J^q ^^•j'tdF{t) < +oo then hm^_,o+ ^0(x) = +oo and then, O is strictly increasing in a 
neighborhood of 0. This last fact implies that 

sup e^{x) > e^(0) = (--l) log(2). 
If (I26[) holds true then lim^_o+ ^0-y(a::) = — oo, and hence, there exists 7c such that for any 7 < 7c 

sup e^(x) = e^(0) = (--l) log(2). 

xs[o,i] \7 / 

Now set 

PX /' + OC /' + CXD 

yl(x-) = / fdJ'(f), and H(s) ■- / te-^'dFit) = / e""'dA(t). 
io io Jo 

The well-known Karamata tauberian theorem (see, e.g. |24j) yields that, given ct > and L slowly 

varying, H{s) ~ s'" L{l/s) as s goes to if and only if A{x) ~ x" L(x) /T{1 + cr) as x goes to +00. 

Hence, it remains to prove that if (I29|l holds true then A{x) ~ a;''L(a::)/r(l + a). Observe that 

Aix) = --L{x)x^-'^ + [ s-'^L{s)ds. 

Jo 

At this stage the claim follows since it is easy to check that s~'^ L{s)ds = x^~^L{x), where L{x) is 
still slowly varying. 
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